Lya Laberge, General Dynamics C4 Systems, lya.laberge@gdc4s.com PRIMARY
Sid Kaul, General Dynamics C4 Systems, sid.kaul@gdc4s.com
Naomi Anderson, General Dynamics C4 Systems, naomi.anderson@gdc4s.com
Charles Agnew, General Dynamics C4 Systems, charles.agnew@gdc4s.com
David.Goldstein, General Dynamics C4 Systems, david.goldstein@gdc4s.com
Jake Kolojejchick, General Dynamics C4 Systems, john.kolojejchick@gdc4s.com
Student Team: NO
CoMotion®
Video:
Answers to Mini-Challenge 1 Questions:
MC 1.1 Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe?
A composite health and policy status snapshot of Bank of Money, created for February 2 at 2PM BMT (BankWorld Mean Time), can be seen below. The analysis of the 2:00 PM data snapshot indicates that Region 5 and Region 10 have zero occurrences of Policy Status 1 events and a high number of Policy Status 2 events. This indicates that these regions are experiencing behavior that is sub-optimal.
Additionally, the heat map below looks at the aggregated values as of 2:00 PM BMT for Policy and Activity statuses and shows some outliers indicating that activity might warrant a deeper analysis. For example, combined events of Policy status 5 “Machine has a possible virus infection and/or questionable files have been found” and Activity Flag 1 “Normal/Healthy” suggests improper behavior worth investigation, especially since in this particular chart there is only one such event. The chart is created to filter out events in the Activity Flag 1, Policy Status 1 and Policy Status 2 intersections. These values are so high that they obscure the rest of the data patterns.
The final chart is a map view of the entirety of the network during the 2PM timeframe. Larger circles indicate locations where more events took place. The colors represent differing time zones. In general, the circles correspond fairly well to the layout of the world, with the largest circles being the datacenters, the next largest are the large regional offices, etc.
The snapshot view shows that a datacenter on the top of the map does not contain the same amount of events as other datacenters. In fact, datacenter 5 seems to have as few events as a small regional office. Multiple possible reasons exist for this which all need to be analyzed and evaluated.
MC 1.2 Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?
The initial network snapshot work done on February 2nd, 2012 led us to further investigation of possible issues. Several different global views of the network data gave us starting points for further research. The first view compares the patterns of total Events per region to the Policy Status deviations.
Even without knowing that regions 1 through 10 are the larger regional offices, looking at this visualization depicts more events at those regions than in others. The only region with considerably more events is Headquarters. In general, there seems to be a standard pattern to the event distribution matching asset distribution.
This visualization shows that the entirety of regions 5 and 10 lack Policy Status 1 events. The chart also shows an unusual number of Policy Status 2 events in this region. Thus, we find the first anomaly to explore.
The second visualization is a pattern of Events by business unit over time. The colors encode the Policy Status and the size of the tick marks denote how many events of that kind occurred at each time interval. This time interval comparison is in BMT.
There is a suspicious spread in the redder colors of the spectrum indicating that as time goes on, the entire network seems to slowly get compromised. We need to do some more research on where this began and try to draw some conclusions on the topic.
The final chart is the map we used during the 2 PM snapshot, only now it contains the entire data set. All five datacenters and ten regional offices are displayed more prominently due to the amount of events in those locations. One flaw of this visual is that main headquarters is eclipsed by the datacenter circle, so further investigation is performed on the headquarters location independently.
In line with the pattern that we witnessed in the 2 PM snapshot, the upper-most datacenter circle is still smaller than the rest. This is Datacenter 5, which in the 2 PM snapshot showed a significant difference in number of events from the other datacenters. The size of the circle tells us that it had more issues than just what we saw in the snapshot. We will drill down into this occurrence.
Anomaly 1:
Regions 5 and 10 show a very different pattern from other Regions in terms of their Policy Status as the comparison between Region 5 and Region 7 below demonstrates. Further drill down served to reinforce that their policy status alerts start at 2 and increase. This tells us that, probably, the patching of regions is done in a holistic manner and these two regions are behind the others. The anomaly is seen during the entire timeframe of the available data.
Anomaly 2:
Based on the coloration of the events by time and business unit, we delved into why the Policy Status of the entire network got progressively worse throughout the two days. The investigation led us to one compromised IP address, 172.2.194.20, at Datacenter 2. We tracked it through time to determine when the initial infection or file corruption occurred.
In the chart above, the server that was compromised was already not sending normal Policy Status events. At around midnight in the server's time zone, its status jumps from a 2 to a 3 and then at 3:45 AM local time the policy shows the machine is compromised. This is ground zero. The infection then spreads to different branches in different regions and soon becomes uncontained.
How the initial infection occurred is not clear. Activity Levels showed no one trying to log into the machine and getting locked out, or anyone inserting a peripheral device which could contain a virus into the machine. The server which spread the infection was a Compute server which has little definition as to its place in the network and how other machines connect to it. A probable explanation is that the machine not being patched appropriately (a Policy Status of 2) left it vulnerable to external attack.
Anomaly 3:
The third anomaly includes Datacenter 5. When investigating why there were fewer total events in that datacenter compared to others, we noticed that many of its machines seem to be down until noon local time, at which point they all become available again. The sudden increase in available servers as of noon local time seems to point to scheduled downtime for this particular datacenter. Our hypothesis for this phenomenon is that maintenance of the datacenter began in the middle of the night, but ran long. In the included screenshot, the chart on the left is a datacenter in the same time zone which has an expected amount of events, the one on the right shows Datacenter 5 with a distinct lack of events until noon.
Anomaly 4:
The screenshot for Anomaly 1 and several other charts we viewed show a lack of compliance throughout the entirety of the corporation to business rules regarding shutting down of workstations in the evenings. This lack of compliance leads to lack of physical security for those workstations with the possibility of virus laden peripherals being inserted during the times when employees are absent from the location. The image below shows workstations remaining up between 6 PM and 7 AM local time even at main headquarters. While the network currently doesn't have unusual activity levels during non-business hours, this still poses a potential problem.
Mini-challenge 1 begins with analysis of a single event-reporting time slice. During investigation of this time slice, Region 5, Region 10, and Headquarters show anomalous behavior. Expanding the analysis to the full data set, we discovered policy compliance issues across the corporation such as users leaving workstations up at the end of the day, long-running maintenance cycles in Datacenter 5, a lag in appropriate patching for Regions 5 and 10, and a viral infection introduction point at Datacenter 2.